This document summarizes a presentation for lab meeting March 2019.
So what I want to do is to be able to predict lodgepole pine pollination phenology based on daily temperatures in the spring. Today I’m going to be talking primarily about the model I’m using to try to do that and I’m hoping for advice and feedback on building and interpreting the model.
But I’m going to start with some context for why I am interested in pollination phenology.
This photo is of a young lodgepole pine with 1st year female cones just peeking out at the top of the main shoot that are probably just finishing up their receptive period, 2nd year seed cones behind the main shoot, and pollen cones that have finished shedding pollen and are drying up and turning brown.
Lodgepole pine with 1st and 2nd year seed cones and 1st year pollen cones. Pollen cones have finished shedding pollen. CC BY 2.0 by S. Rae. Pinus contorta (Lodgepole Pine) - pollen cones Monynut Forest, Berwickshire, Scotland
Here’s a closer look at what male (pollen) cones look like when they’re ready to shed pollen and what female cones look like around receptivity. In the spring, as it warms up, male cones mature and begin shedding pollen and female cones develop and become receptive to pollen. While these are obviously not actual flowers, it’s typical in forestry to refer to these phenological stages as “flowering”.
Male strobili
Female strobili
Every year, pollen is shed from male strobili and is lofted through and above the forest canopy. Much of it stays in the local population, but some of it is whisked across the landscape - and it has the potential to travel very far indeed.
Spruce pollen at Eagle Lake. Jack Woods.
3000 km Potential white spruce and jack pine pollen travel
600 km Viable scots pine pollen travel
100 km Effective scots pine pollen travel (4.3% goes further)
Distances summarized in Kremer et al. 2012.
Lodgepole pine has an enormous range. How far can pollen move across that range?
Distribution map
Lodgepole has quite low (Fst 0.016)/among population variation, presumably because of its promiscuous pollen movement. But it also maintains local adaptation across its range. Considering just the BC portion of the range here, Liepe et al. identified 9 areas of broad local adaptation. Which populations are connected by pollen movement? How does that compare to patterns of local adaptation?
Local adaptation in lodgepole. From Liepe et al. 2016 in Evolutionary Applications
I think this question is especially interesting in the context of climate change. If trying to keep up with climate change geographically, lodgepole pine seeds don’t go very far. But pollen does! Which populations are connected by pollen movement right now? Is pollen moving from cold places to hot places or vice versa?
Figure from Shaw 2001
Long-distance gene flow could compensate for the long-generation time of trees, facilitating evolutionary change in a shifting climate. (Kremer et al. 2012)
Figuring out where pollen goes is very difficult, especially if you’re interested in connectivity between populations in a species with continental scale distributions. We can look at where the wind blows, in more or less detail, but there’s another important factor to consider.
If the pollen arrives in a population when the cones are not receptive, then there is no possibility it can pollinate any ovules. If we can model the timing of pollen shed and cone receptivity across the landscape, then we know which populations are and are not phenologically matched.
This makes figuring out where pollen goes easier, because we only have to worry about the wind movement between phenologically matched populations.
So with that in mind, today I’m going talk mostly about the process of building a model to predict pollination phenology in lodgepole pine. Ultimately this is part of my “figure out where pollen goes plan” but there are also some interesting questions about pollination phenology itself.
For example, lodgepole pine have many geographically varying phenological traits/local adaptation. Is that true for pollination phenology?
Also, as climate changes how will phenology shift? In seed orchards in BC, where much of the seed for replanting after logging is produced, they occassionally have trouble with protandry, where all of the pollen is shed before the cones become receptive. This is an operational problem, but has demographic and evolutionary consequences in the wild as well since it could reduce seedset and increase outcrossing
My phenology data is sourced from 8 seed orchards in British Columbia. They’re all in the interior. Today I’m going to show you data from 2 seed orchard sites, Kalamalka and Prince George. Both are in the interior but Prince George is further North. Kalamalka is generally a bit warmer than Prince George. I am not showing you all sites today because I was having trouble downloading gridded daily historical temperature data from Pacific Climate Impacts Consortium.
Seed Orchard locations
Site Temperatures
British Columbia is divided into a bunch of areas with relatively similar climates called Seed Planning Units. This map shows just a portion of the SPUs - the ones that represent the provenances of most of my data. Today, I’ll be showing you data from trees sourced from low elevations in the Bulkley Valley, Nelson, Prince George, and the Central Plateau regions and grown in Prince George (the orange region) and Kalamalka (a bit southeast of the orange region).
Parent tree source locations
So the data I’m going to show you today represents 678 ramets of 108 clones sourced from 4 regions and grown at 2 sites. The data was collected between 1997 and 2011. It was collected and scored in a couple different ways, but I harmonized and rescored all the data to be 1 (not yet flowering), 2 (flowering), 3 (finished flowering)
Phenophases (per sex)
So here’s a quick look at the phenophase data. The top graph shows the cumulative proportion of the 2nd phenophase - flowering. The bottom shows the same for the 3rd phenophase - finished flowering. So you can read the top graph as the proportion of trees that have transitioned from not yet flowering to flowering as heatsum increases. Similarly, you can read the bottom graph as the proportion of trees that have transitioned from flowering to done flowering.
Female phenological states are represented with purple lines and male with green. Kalamalka, the more southern site is shown with dotted lines and Prince George is shown with solid lines
Eyeball analysis
I used an ordered logistic regression. This is a generalization of a generalized linear model. It’s built by attaching a cumulative link function to a categorical outcome distribution. I’m going to talk about the model in a little bit of detail because it’s not super commonly used in ecology.
Basically, you get a logistic model for each transition between categories and, very importantly, it can keep the categories in order!
Begin with a series of logistic regressions for \(K=3\) categories, calculating the likelihood of outcomes \(y > k\)
\[\begin{align} Pr(y \gt 1) & = logistic(\eta - c_1) \\ Pr(y \gt 2) & = logistic(\eta - c_2) \\ Pr(y \gt 3) & = logistic(\eta - c_3) \\ \end{align}\]
\(\eta\) is a linear predictor of your choice. Thresholds/cutpoints \(c_k\) are constrained to increase to force the correct ordering of categories
\[0=c_1\lt c_2\lt c_3\]
So you start by calculating the likelihood of outcomes greater than or equal to a given outcome.
Then find the likelihood of each category
\[\begin{align} Pr(y=k) & = Pr(y > k-1) - Pr(y>k) \\ & = logistic(\eta - c_{k-1}) - logistic(\eta - c_k) \\ \end{align}\]
So for \(K=3\) categories
\[\text{OrderedLogistic}(k~|~\eta,c) = \left\{ \begin{array}{ll} 1 - \text{logistic}(\eta - c_1) & \text{if } k = 1, \\[4pt] \text{logistic}(\eta - c_1) - \text{logistic}(\eta - c_2) & \text{if } k=2, \text{and} \\[4pt] \text{logistic}(\eta - c_3) - 0 & \text{if } k = 3. \end{array} \right.\]
You get 1 of the categories for “free”, so you can just set the first cutpoint to 0.
So what does this model actually look like? If we consider the case where we have 3 categories and 2 transitions between them, we get these two curves - the first showing the transition from not flowering to flowering and the second the transition from flowering to done flowering. The “slope” parameter beta controls how quickly the transition occurs while the cutpoints determine the location of the curve - sliding the curve along the x-axis.
You can calculate the inflection point of each curve by dividing the cutpoints by the slope, which I call the transition point - \(H_1\) and \(H_2\) below. The transition point is the heatsum at which 50% of the trees have transitioned from one phase to the next (or a given tree is 50% likely to have transitioned).
An important assumption of the this model is constructed is that the slopes are the same for both transitions.
\[y \sim \mathrm{OrderedLogistic}(k|\eta,c) \\ \eta = \beta x \\ \beta = 0.2 \\ c_{2}=11\text{, }c_3=15\]
Ordered logistic with 3 categories and 2 transitions
So here’s a very simple version of an ordered logistic regression for phenology. This is only for the Prince George data and I’ve fit pollen shed and receptivity separately. - Phenological state has a categorical distribution with a cumulative link function - that’s ordered logistic - with linear predictor \(\eta\) and cutpoints \(c\) - \(\eta_i\) equals slope beta * heatsum - \(\beta\) must be between 0 and 1 and the prior is skewed towards 0. - \(c_k\) are ordered with a normal prior with mean 5 and sd 10. Cutpoints are labelled c1 for the transition between state 1 and state 2 and c2 for the transition between 2 and 3. - Then I derive 2 new parameters from the cutpoints and slope - H1 and H2. These describe the heatsum at which transition is 50% likely - so in our case, the point at which we expect half the trees to have transitioned from not flowering to flowering and from flowering to done flowering.
\[\begin{array}{ll} S_i&\sim \mathrm{OrderedLogistic}(\eta_i, c_k)\\ \eta_i&= \beta h_i \\ \beta&\sim \mathrm{beta}(1,1) \\ c_k&\sim \mathrm{normal}(5,10) \\ H_1 &= \beta/c_1\\ H_2 &= \beta/c_2\\ \end{array}\]
This graph does not show the cumulative proportion of phenophase 1. If it did, the line would represent the cumulative proportion of trees that were recorded as not yet flowering. Since all trees are actually not yet flowering at all heatsums prior to the time they start flowering, this curve isn’t very meaningful. It can also be calculated from the other curves - which is why we can fix the first cutpoint at 0 and only calculate the other 2.
I fit the model in stan - this is what the stancode for the model looks like. As you can see, there are some details associated with the stan model that are not included in the equations above - like bounding beta between 0 and 1 in addition to including a beta prior. This helps stan search better, but also reflects the state of our prior knowledge.
// ordered logistic model
data{
int<lower=2> K; // number of possible phenophases
int<lower=0> N; // number of phenophase observations
int<lower=1, upper=K> phenophase[N]; //phenophase outcomes (response)
real<lower=0> heatsum[N]; //heatsums
}
parameters{
real<lower=0, upper=1> beta;
ordered[K-1] c; //how many cutpoints are there
}
model {
c ~ normal(5, 10); //cutpoints prior
beta ~ beta(1,1); //beta prior
for (n in 1:N)
phenophase[n] ~ ordered_logistic(heatsum[n] * beta, c);
}
After fitting the model, here are the posterior distributions for the parameter values. The slope in the female model is lower than in the male model. The first cutpoint is lower in the male model in the female model and the second cutpoint is higher in the male model than in the female model. So what does that actually mean?
Well, if we sample our parameter values and use them to draw our transition curves/cumulative proportion curves, we get this. The female model is in the top graph and the male model in the bottom graph. The purple line shows the transition from not yet flowering to flowering and the yellow line shows the transition from flowering to finished. The grey bars are the point at which half of the population has transitioned. The width of the lines represent the distribution of the parameter values.
Model results
So if you look at the male model, you can see that pollen shed is beginning and ending before receptivity, but the models overlap a lot. I’m curious how this translates to days and what it looks like if heatsum increases more or less rapidly on the scale of days - I suspect this is how protandry occurs.
I need to test whether protandry can ever occur given the model fit I find since 1) it does actually occur in some years and 2) it’s something I want to report back to the seed orchard managers who sent me my data.
It’s not as apparent here as in the density plots earlier, but beta, the slope parameter, is lower in the female model. This means that receptivity transitions take longer than pollen shed transitions.
Now that the basic model is working in stan, I’m working on including all of my data and adding in the releveant levels/effects. The obvious first thing to add to the model is sex.
I also need to add provenance effects because I need to know if provenance actually affects pollination phenology response to heatsum. But there’s also site and clone to consider.
In this model, I fit separate slopes for males and females and predict different transition points (\(H_1\) and \(H_2\)) for provenances and clones.
\[\begin{array}{ll} S_i&\sim \mathrm{OrderedLogistic}(\eta_i, c_k)\\ \eta_i&= \beta_{sex} h_i + \alpha_{provenance} + \alpha_{clone} \\ \beta_{clone}&\sim \mathrm{beta}(1,1) \\ c_k&\sim \mathrm{normal}(5,10) \\ \alpha_{clone} &\sim \mathrm{normal}(0,1) \\ \alpha_{provenance} &\sim \mathrm{normal}(0,1) \\ \bf{H}_1 &= \beta_{sex}/(c_1+\alpha_{clone}+\alpha_{provenance})\\ \bf{H}_2 &= \beta_{sex}/(c_2+\alpha_{clone}+\alpha_{provenance}) \end{array}\]
I actually fit that model just before I recorded this presentation and it recaptures the difference in transition speed between male and female phenophases, and it looks like not all provenances are the same. But I’m not entirely sure I set that model up correctly and haven’t run all the post model checks that need running. So we’ll see if this holds up.
Importantly, this model is not yet a multilevel model. I need to add hyperpriors and parameters to allow pooling - e.g. letting all provenances improve estimates for any individual provenance.
I would like any feedback you have to give. I was having trouble getting access to the PCIC gridded weather data, but they’ve fixed their download tool. So I should be able to add all of my sites shortly and ultimately extend the model across the range of lodgepole pine and start looking at which populations are phenologically matched.
Please tell me what’s wrong with using this model, limitations and weaknesses I should be aware of that I did not mention.
Have you built some multilevel models? It would be nice to sit down and talk through how to add the different levels/effects appropriately. If you have done this in a Bayesian context, even better. I’m a little confused about how to choose some of the priors and things like centered vs non-centered fits and whether I should “center” my heatsum data.
I used a threshold of 5 degrees C when I calculated heatsum. I need to figure out how sensitive the model is to that. I would like to try adding it to the model directly, but I think the model could actually become non-identifiable if I include it directly. So I may need to just run several versions of the model to see.
Thank you very much for your time.